Synchronization (computer Science)
   HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
, synchronization refers to one of two distinct but related concepts: synchronization of processes, and synchronization of
data In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted ...
. ''Process synchronization'' refers to the idea that multiple processes are to join up or
handshake A handshake is a globally widespread, brief greeting or parting tradition in which two people grasp one of each other's like hands, in most cases accompanied by a brief up-and-down movement of the grasped hands. Customs surrounding handshakes a ...
at a certain point, in order to reach an agreement or commit to a certain sequence of action. ''
Data synchronization Data synchronization is the process of establishing consistency between source and target data stores, and the continuous harmonization of the data over time. It is fundamental to a wide variety of applications, including file synchronization and m ...
'' refers to the idea of keeping multiple copies of a dataset in coherence with one another, or to maintain
data integrity Data integrity is the maintenance of, and the assurance of, data accuracy and consistency over its entire Information Lifecycle Management, life-cycle and is a critical aspect to the design, implementation, and usage of any system that stores, proc ...
. Process synchronization primitives are commonly used to implement data synchronization.


The need for synchronization

The need for synchronization does not arise merely in multi-processor systems but for any kind of concurrent processes; even in single processor systems. Mentioned below are some of the main needs for synchronization: '' Forks and Joins:'' When a job arrives at a fork point, it is split into N sub-jobs which are then serviced by n tasks. After being serviced, each sub-job waits until all other sub-jobs are done processing. Then, they are joined again and leave the system. Thus, parallel programming requires synchronization as all the parallel processes wait for several other processes to occur. '' Producer-Consumer:'' In a producer-consumer relationship, the consumer process is dependent on the producer process till the necessary data has been produced. ''Exclusive use resources:'' When multiple processes are dependent on a resource and they need to access it at the same time, the operating system needs to ensure that only one processor accesses it at a given point in time. This reduces concurrency.


Thread or process synchronization

Thread synchronization is defined as a mechanism which ensures that two or more concurrent processes or threads do not simultaneously execute some particular program segment known as
critical section In concurrent programming, concurrent accesses to shared resources can lead to unexpected or erroneous behavior, so parts of the program where the shared resource is accessed need to be protected in ways that avoid the concurrent access. One way to ...
. Processes' access to critical section is controlled by using synchronization techniques. When one thread starts executing the
critical section In concurrent programming, concurrent accesses to shared resources can lead to unexpected or erroneous behavior, so parts of the program where the shared resource is accessed need to be protected in ways that avoid the concurrent access. One way to ...
(serialized segment of the program) the other thread should wait until the first thread finishes. If proper synchronization techniques are not applied, it may cause a
race condition A race condition or race hazard is the condition of an electronics, software, or other system where the system's substantive behavior is dependent on the sequence or timing of other uncontrollable events. It becomes a bug when one or more of t ...
where the values of variables may be unpredictable and vary depending on the timings of context switches of the processes or threads. For example, suppose that there are three processes, namely 1, 2, and 3. All three of them are concurrently executing, and they need to share a common resource (critical section) as shown in Figure 1. Synchronization should be used here to avoid any conflicts for accessing this shared resource. Hence, when Process 1 and 2 both try to access that resource, it should be assigned to only one process at a time. If it is assigned to Process 1, the other process (Process 2) needs to wait until Process 1 frees that resource (as shown in Figure 2). Another synchronization requirement which needs to be considered is the order in which particular processes or threads should be executed. For example, one cannot board a plane before buying a ticket. Similarly, one cannot check e-mails before validating the appropriate credentials (for example, user name and password). In the same way, an ATM will not provide any service until it receives a correct PIN. Other than mutual exclusion, synchronization also deals with the following: *
deadlock In concurrent computing, deadlock is any situation in which no member of some group of entities can proceed because each waits for another member, including itself, to take action, such as sending a message or, more commonly, releasing a lo ...
, which occurs when many processes are waiting for a shared resource (critical section) which is being held by some other process. In this case, the processes just keep waiting and execute no further; *
starvation Starvation is a severe deficiency in caloric energy intake, below the level needed to maintain an organism's life. It is the most extreme form of malnutrition. In humans, prolonged starvation can cause permanent organ damage and eventually, dea ...
, which occurs when a process is waiting to enter the critical section, but other processes monopolize the critical section, and the first process is forced to wait indefinitely; *
priority inversion In computer science, priority inversion is a scenario in scheduling in which a high priority task is indirectly superseded by a lower priority task effectively inverting the assigned priorities of the tasks. This violates the priority model that h ...
, which occurs when a high-priority process is in the critical section, and it is interrupted by a medium-priority process. This violation of priority rules can happen under certain circumstances and may lead to serious consequences in real-time systems; *
busy waiting In computer science and software engineering, busy-waiting, busy-looping or spinning is a technique in which a process repeatedly checks to see if a condition is true, such as whether keyboard input or a lock is available. Spinning can also be use ...
, which occurs when a process frequently polls to determine if it has access to a critical section. This frequent polling robs processing time from other processes.


Minimizing synchronization

One of the challenges for exascale algorithm design is to minimize or reduce synchronization. Synchronization takes more time than computation, especially in distributed computing. Reducing synchronization drew attention from computer scientists for decades. Whereas it becomes an increasingly significant problem recently as the gap between the improvement of computing and latency increases. Experiments have shown that (global) communications due to synchronization on distributed computers takes a dominated share in a sparse iterative solver. This problem is receiving increasing attention after the emergence of a new benchmark metric, the High Performance Conjugate Gradient(HPCG), for ranking the top 500 supercomputers.


Classic problems of synchronization

The following are some classic problems of synchronization: * The Producer–Consumer Problem (also called The Bounded Buffer Problem); * The Readers–Writers Problem; * The Dining Philosophers Problem. These problems are used to test nearly every newly proposed synchronization scheme or primitive.


Hardware synchronization

Many systems provide hardware support for
critical section In concurrent programming, concurrent accesses to shared resources can lead to unexpected or erroneous behavior, so parts of the program where the shared resource is accessed need to be protected in ways that avoid the concurrent access. One way to ...
code. A single processor or
uniprocessor system A uniprocessor system is defined as a computer system that has a single central processing unit that is used to execute computer tasks. As more and more modern software is able to make use of multiprocessing architectures, such as SMP and MPP, t ...
could disable
interrupt In digital computers, an interrupt (sometimes referred to as a trap) is a request for the processor to ''interrupt'' currently executing code (when permitted), so that the event can be processed in a timely manner. If the request is accepted, ...
s by executing currently running code without preemption, which is very inefficient on
multiprocessor Multiprocessing is the use of two or more central processing units (CPUs) within a single computer system. The term also refers to the ability of a system to support more than one processor or the ability to allocate tasks between them. There ar ...
systems. "The key ability we require to implement synchronization in a multiprocessor is a set of hardware primitives with the ability to atomically read and modify a memory location. Without such a capability, the cost of building basic synchronization primitives will be too high and will increase as the processor count increases. There are a number of alternative formulations of the basic hardware primitives, all of which provide the ability to atomically read and modify a location, together with some way to tell if the read and write were performed atomically. These hardware primitives are the basic building blocks that are used to build a wide variety of user-level synchronization operations, including things such as
locks Lock(s) may refer to: Common meanings *Lock and key, a mechanical device used to secure items of importance *Lock (water navigation), a device for boats to transit between different levels of water, as in a canal Arts and entertainment * ''Lock ...
and barriers. In general, architects do not expect users to employ the basic hardware primitives, but instead expect that the primitives will be used by system programmers to build a synchronization library, a process that is often complex and tricky." Many modern hardware provides special atomic hardware instructions by either
test-and-set In computer science, the test-and-set instruction is an instruction used to write (set) 1 to a memory location and return its old value as a single atomic (i.e., non-interruptible) operation. The caller can then "test" the result to see if the stat ...
the memory word or
compare-and-swap In computer science, compare-and-swap (CAS) is an atomic instruction used in multithreading to achieve synchronization. It compares the contents of a memory location with a given value and, only if they are the same, modifies the contents of tha ...
contents of two memory words.


Synchronization strategies in programming languages

In
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...
, to prevent thread interference and memory consistency errors, blocks of code are wrapped into synchronized ''(lock_object)'' sections. This forces any thread to acquire the said lock object before it can execute the block. The lock is automatically released when the thread which acquired the lock, and is then executing the block, leaves the block or enters the waiting state within the block. Any variable updates, made by a thread in a synchronized block, become visible to other threads when they similarly acquire the lock and execute the block. Java ''synchronized'' blocks, in addition to enabling mutual exclusion and memory consistency, enable signaling—i.e., sending events from threads which have acquired the lock and are executing the code block to those which are waiting for the lock within the block. This means that Java synchronized sections combine functionality of mutexes and events. Such primitive is known as synchronization monitor. Any object may be used as a lock/monitor in Java. The declaring object is a lock object when the whole method is marked with ''synchronized''. The
.NET Framework The .NET Framework (pronounced as "''dot net"'') is a proprietary software framework developed by Microsoft that runs primarily on Microsoft Windows. It was the predominant implementation of the Common Language Infrastructure (CLI) until bein ...
has synchronization primitives. "Synchronization is designed to be cooperative, demanding that every thread or process follow the synchronization mechanism before accessing protected resources (critical section) for consistent results." In .NET, locking, signaling, lightweight synchronization types, spinwait and interlocked operations are some of mechanisms related to synchronization.


Implementation of Synchronization


Spinlock

Another effective way of implementing synchronization is by using spinlocks. Before accessing any shared resource or piece of code, every processor checks a flag. If the flag is reset, then the processor sets the flag and continues executing the thread. But, if the flag is set (locked), the threads would keep spinning in a loop and keep checking if the flag is set or not. But, spinlocks are effective only if the flag is reset for lower cycles otherwise it can lead to performance issues as it wastes many processor cycles waiting.


Barriers

Barriers are simple to implement and provide good responsiveness. They are based on the concept of implementing wait cycles to provide synchronization. Consider three threads running simultaneously, starting from barrier 1. After time t, thread1 reaches barrier 2 but it still has to wait for thread 2 and 3 to reach barrier2 as it does not have the correct data. Once all the threads reach barrier 2 they all start again. After time t, thread 1 reaches barrier3 but it will have to wait for threads 2 and 3 and the correct data again. Thus, in barrier synchronization of multiple threads there will always be a few threads that will end up waiting for other threads as in the above example thread 1 keeps waiting for thread 2 and 3. This results in severe degradation of the process performance. The barrier synchronization wait function for ith thread can be represented as: (Wbarrier)i = f ((Tbarrier)i, (Rthread)i) Where Wbarrier is the wait time for a thread, Tbarrier is the number of threads has arrived, and Rthread is the arrival rate of threads. Experiments show that 34% of the total execution time is spent in waiting for other slower threads.


Semaphores

Semaphores are signalling mechanisms which can allow one or more threads/processors to access a section. A Semaphore has a flag which has a certain fixed value associated with it and each time a thread wishes to access the section, it decrements the flag. Similarly, when the thread leaves the section, the flag is incremented. If the flag is zero, the thread cannot access the section and gets blocked if it chooses to wait. Some semaphores would allow only one thread or process in the code section. Such Semaphores are called binary semaphore and are very similar to Mutex. Here, if the value of semaphore is 1, the thread is allowed to access and if the value is 0, the access is denied.


Mathematical foundations

Synchronization was originally a process-based concept whereby a lock could be obtained on an object. Its primary usage was in databases. There are two types of (file)
lock Lock(s) may refer to: Common meanings *Lock and key, a mechanical device used to secure items of importance *Lock (water navigation), a device for boats to transit between different levels of water, as in a canal Arts and entertainment * ''Lock ...
; read-only and read–write. Read-only locks may be obtained by many processes or threads. Readers–writer locks are exclusive, as they may only be used by a single process/thread at a time. Although locks were derived for file databases, data is also shared in memory between processes and threads. Sometimes more than one object (or file) is locked at a time. If they are not locked simultaneously they can overlap, causing a deadlock exception.
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...
and
Ada Ada may refer to: Places Africa * Ada Foah, a town in Ghana * Ada (Ghana parliament constituency) * Ada, Osun, a town in Nigeria Asia * Ada, Urmia, a village in West Azerbaijan Province, Iran * Ada, Karaman, a village in Karaman Province, ...
only have exclusive locks because they are thread based and rely on the
compare-and-swap In computer science, compare-and-swap (CAS) is an atomic instruction used in multithreading to achieve synchronization. It compares the contents of a memory location with a given value and, only if they are the same, modifies the contents of tha ...
processor instruction. An abstract mathematical foundation for synchronization primitives is given by the
history monoid In mathematics and computer science, a history monoid is a way of representing the histories of concurrently running computer processes as a collection of strings, each string representing the individual history of a process. The history monoid p ...
. There are also many higher-level theoretical devices, such as
process calculi In computer science, the process calculi (or process algebras) are a diverse family of related approaches for formally modelling concurrent systems. Process calculi provide a tool for the high-level description of interactions, communications, and ...
and
Petri net A Petri net, also known as a place/transition (PT) net, is one of several mathematical modeling languages for the description of distributed systems. It is a class of discrete event dynamic system. A Petri net is a directed bipartite graph that ...
s, which can be built on top of the history monoid.


Synchronization examples

Following are some synchronization examples with respect to different platforms.


Synchronization in Windows

Windows Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for serv ...
provides: * interrupt masks, which protect access to global resources (critical section) on uniprocessor systems; *
spinlock In software engineering, a spinlock is a lock that causes a thread trying to acquire it to simply wait in a loop ("spin") while repeatedly checking whether the lock is available. Since the thread remains active but is not performing a useful task, ...
s, which prevent, in multiprocessor systems, spinlocking-thread from being preempted; *
dynamic dispatch In computer science, dynamic dispatch is the process of selecting which implementation of a polymorphic operation (method or function) to call at run time. It is commonly employed in, and considered a prime characteristic of, object-oriented ...
ers, which act like
mutex In computer science, a lock or mutex (from mutual exclusion) is a synchronization primitive: a mechanism that enforces limits on access to a resource when there are many threads of execution. A lock is designed to enforce a mutual exclusion concur ...
es, semaphores,
event Event may refer to: Gatherings of people * Ceremony, an event of ritual significance, performed on a special occasion * Convention (meeting), a gathering of individuals engaged in some common interest * Event management, the organization of e ...
s, and
timer A timer is a specialized type of clock used for measuring specific time intervals. Timers can be categorized into two main types. The word "timer" is usually reserved for devices that counts down from a specified time interval, while devices th ...
s.


Synchronization in Linux

Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which ...
provides: * semaphores; *
spinlock In software engineering, a spinlock is a lock that causes a thread trying to acquire it to simply wait in a loop ("spin") while repeatedly checking whether the lock is available. Since the thread remains active but is not performing a useful task, ...
; *
barrier A barrier or barricade is a physical structure which blocks or impedes something. Barrier may also refer to: Places * Barrier, Kentucky, a community in the United States * Barrier, Voerendaal, a place in the municipality of Voerendaal, Netherl ...
s; *
mutex In computer science, a lock or mutex (from mutual exclusion) is a synchronization primitive: a mechanism that enforces limits on access to a resource when there are many threads of execution. A lock is designed to enforce a mutual exclusion concur ...
; *
readers–writer lock In computer science, a readers–writer (single-writer lock, a multi-reader lock, a push lock, or an MRSW lock) is a Synchronization (computer science), synchronization primitive that solves one of the readers–writers problems. An RW lock allo ...
s, for the longer section of codes which are accessed very frequently but don't change very often; *
read-copy-update In computer science, read-copy-update (RCU) is a synchronization mechanism that avoids the use of lock primitives while multiple threads concurrently read and update elements that are linked through pointers and that belong to shared data structur ...
(RCU). Enabling and disabling of kernel preemption replaced spinlocks on uniprocessor systems. Prior to kernel version 2.6,
Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which ...
disabled interrupt to implement short critical sections. Since version 2.6 and later, Linux is fully preemptive.


Synchronization in Solaris

Solaris Solaris may refer to: Arts and entertainment Literature, television and film * ''Solaris'' (novel), a 1961 science fiction novel by Stanisław Lem ** ''Solaris'' (1968 film), directed by Boris Nirenburg ** ''Solaris'' (1972 film), directed by ...
provides: * semaphores; *
condition variable In concurrent programming, a monitor is a synchronization construct that allows threads to have both mutual exclusion and the ability to wait (block) for a certain condition to become false. Monitors also have a mechanism for signaling other th ...
s; * adaptive mutexes, binary semaphores that are implemented differently depending upon the conditions; * readers–writer locks: *
turnstiles A turnstile (also called a turnpike, gateline, baffle gate, automated gate, turn gate in some regions) is a form of gate which allows one person to pass at a time. A turnstile can be configured to enforce one-way human traffic. In addition, a ...
, queue of threads which are waiting on acquired lock.


Pthreads synchronization

Pthreads POSIX Threads, commonly known as pthreads, is an execution model that exists independently from a language, as well as a parallel execution model. It allows a program to control multiple different flows of work that overlap in time. Each flow o ...
is a platform-independent
API An application programming interface (API) is a way for two or more computer programs to communicate with each other. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how ...
that provides: * mutexes; * condition variables; * readers–writer locks; * spinlocks; *
barrier A barrier or barricade is a physical structure which blocks or impedes something. Barrier may also refer to: Places * Barrier, Kentucky, a community in the United States * Barrier, Voerendaal, a place in the municipality of Voerendaal, Netherl ...
s.


Data synchronization

A distinctly different (but related) concept is that of
data synchronization Data synchronization is the process of establishing consistency between source and target data stores, and the continuous harmonization of the data over time. It is fundamental to a wide variety of applications, including file synchronization and m ...
. This refers to the need to update and keep multiple copies of a set of data coherent with one another or to maintain
data integrity Data integrity is the maintenance of, and the assurance of, data accuracy and consistency over its entire Information Lifecycle Management, life-cycle and is a critical aspect to the design, implementation, and usage of any system that stores, proc ...
, Figure 3. For example, database replication is used to keep multiple copies of data synchronized with database servers that store data in different locations. Examples include: *
File synchronization File synchronization (or syncing) in computing is the process of ensuring that computer files in two or more locations are updated via certain rules. In ''one-way file synchronization'', also called mirroring, updated files are copied from a sourc ...
, such as syncing a hand-held MP3 player to a desktop computer; *
Cluster file system A clustered file system is a file system which is shared by being simultaneously mounted on multiple servers. There are several approaches to clustering, most of which do not employ a clustered file system (only direct attached storage for ...
s, which are
file system In computing, file system or filesystem (often abbreviated to fs) is a method and data structure that the operating system uses to control how data is stored and retrieved. Without a file system, data placed in a storage medium would be one larg ...
s that maintain data or indexes in a coherent fashion across a whole
computing cluster A computer cluster is a set of computers that work together so that they can be viewed as a single system. Unlike grid computers, computer clusters have each node set to perform the same task, controlled and scheduled by software. The compo ...
; *
Cache coherency In computer architecture, cache coherence is the uniformity of shared resource data that ends up stored in multiple local caches. When clients in a system maintain caches of a common memory resource, problems may arise with incoherent data, whi ...
, maintaining multiple copies of data in sync across multiple
cache Cache, caching, or caché may refer to: Places United States * Cache, Idaho, an unincorporated community * Cache, Illinois, an unincorporated community * Cache, Oklahoma, a city in Comanche County * Cache, Utah, Cache County, Utah * Cache County ...
s; *
RAID Raid, RAID or Raids may refer to: Attack * Raid (military), a sudden attack behind the enemy's lines without the intention of holding ground * Corporate raid, a type of hostile takeover in business * Panty raid, a prankish raid by male college ...
, where data is written in a redundant fashion across multiple disks, so that the loss of any one disk does not lead to a loss of data; *
Database replication Replication in computing involves sharing information so as to ensure consistency between redundant resources, such as software or hardware components, to improve reliability, fault-tolerance, or accessibility. Terminology Replication in comp ...
, where copies of data on a
database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases sp ...
are kept in sync, despite possible large geographical separation; * Journaling, a technique used by many modern file systems to make sure that file metadata are updated on a disk in a coherent, consistent manner.


Challenges in data synchronization

Some of the challenges which user may face in data synchronization: * data formats complexity; * real-timeliness; * data security; * data quality; * performance.


Data formats complexity

Data formats tend to grow more complex with time as the organization grows and evolves. This results not only in building simple interfaces between the two applications (source and target), but also in a need to transform the data while passing them to the target application. ETL (extraction transformation loading) tools can be helpful at this stage for managing data format complexities.


Real-timeliness

In real-time systems, customers want to see the current status of their order in e-shop, the current status of a parcel delivery—a real time parcel tracking—, the current balance on their account, etc. This shows the need of a real-time system, which is being updated as well to enable smooth manufacturing process in real-time, e.g., ordering material when enterprise is running out stock, synchronizing customer orders with manufacturing process, etc. From real life, there exist so many examples where real-time processing gives successful and competitive advantage.


Data security

There are no fixed rules and policies to enforce data security. It may vary depending on the system which you are using. Even though the security is maintained correctly in the source system which captures the data, the security and information access privileges must be enforced on the target systems as well to prevent any potential misuse of the information. This is a serious issue and particularly when it comes for handling secret, confidential and personal information. So because of the sensitivity and confidentiality, data transfer and all in-between information must be encrypted.


Data quality

Data quality is another serious constraint. For better management and to maintain good quality of data, the common practice is to store the data at one location and share with different people and different systems and/or applications from different locations. It helps in preventing inconsistencies in the data.


Performance

There are five different phases involved in the data synchronization process: *
data extraction Data extraction is the act or process of retrieving data out of (usually unstructured or poorly structured) data sources for further data processing or data storage ( data migration). The import into the intermediate extracting system is thus usua ...
from the source (or master, or main) system; *
data transfer Data transmission and data reception or, more broadly, data communication or digital communications is the transfer and reception of data in the form of a digital bitstream or a digitized analog signal transmitted over a point-to-point or ...
; *
data transformation In computing, data transformation is the process of converting data from one format or structure into another format or structure. It is a fundamental aspect of most data integrationCIO.com. Agile Comes to Data Integration. Retrieved from: http ...
; * data load to the target system. * data updation Each of these steps is critical. In case of large amounts of data, the synchronization process needs to be carefully planned and executed to avoid any negative impact on performance.


See also

*
Futures and promises In computer science, future, promise, delay, and deferred refer to constructs used for synchronizing program execution in some concurrent programming languages. They describe an object that acts as a proxy for a result that is initially unknown, ...
, synchronization mechanisms in pure functional paradigms


References

*


External links


Anatomy of Linux synchronization methods
at IBM developerWorks
''The Little Book of Semaphores''
by Allen B. Downey

{{Parallel_computing Concurrency (computer science) Computer-mediated communication
Computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
Edsger W. Dijkstra